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Abstract. A model for statistical ranking is a family of probability distributions whose 
states are orderings of a fixed finite set of items. We represent the orderings as maximal 
chains in a graded poset. The most widely used ranking models are parameterized by 
rational function in the model parameters, so they define algebraic varieties. We study 
these varieties from the perspective of combinatorial commutative algebra. One of our 
models, the Plackett-Luce model, is non-toric. Five others are toric: the BirkhofF model, 
the ascending model, the Csiszar model, the inversion model, and the Bradley- Terry model. 
For these models we examine the toric algebra, its lattice polytope, and its Markov basis. 



1. Introduction 

A statistical model for ranked data is a family Ai of probability distribution on the 
symmetric group (5„. Each distribution p{6) in Ai depends on some model parameters 6 
and it associates a probability pTr{6) to each permutation vr of [n] = {1, 2, . . . ,n}. Thus the 
model is a parametrized subset of the (n! — l)-dimensional standard simplex Ag^. 

In algebraic statistics, one assumes that the probabilities Pt^^O) are rational functions 
in the model parameters 6, so that is a semi-algebraic set in Aq^, and one aims to 
characterize the prime ideal Im of polynomials that vanish on A^. In fact, one of the 
origins of the field was the spectral analysis for permutation data described by Diaconis 
and Sturmfels in |12l §6.1]. The corresponding Birkhoff model Ai is the toric variety of the 
Birkhoff polytope. This polytope consists of all bistochastic matrices and it is the convex 
hull of all nxn permutation matrices. There has been a considerable amount of research 
on the geometric invariants of the Birkhoff model Ai. The simplest such invariant is its 
dimension, dim(A^) = {n — 1)^. The degree of Ai is the normalized volume of the Birkhoff 
polytope, a topic of independent interest in combinatorics jS]. Diaconis and Eriksson [TT] 
conjectured that the Markov basis of the Birkhoff model consists of binomials of degree < 3. 

Besides the Birkhoff model, there are many other models for ranked data that are both 
relevant for statistical analysis and have an interesting algebraic structure. It is the objec- 
tive of this article to conduct a comparative study of such models from the perspectives of 
commutative algebra and geometric combinatorics. Both toric models and non-toric models 
are of interest. The former include the models introduced by Csiszar [9^^ fTO], and the latter 
include the Plackett-Luce model [8l[2ll[29] and the generalized Bradley- Terry models [.2T] . 

The organization of this paper is as follows. In Section 2 we give an informal introduction 
to all our models. We write out formulas for the probabilities for the six permutations of 
n = 3 items, and we discuss the subsets they parametrize in the 5-dimensional simplex Aq^. 
Precise formal definitions for the four toric models are given in Section 3. We represent the 
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states as maximal chains in a graded poset Q. Typically, Q is the distributive lattice induced 
by some order constraints on the n items to be ranked. If there are no such constraints 
then Q = 2^^"^ is the Boolean lattice whose maximal chains are all n\ permutations in (5„. 
Non-trivial order constraints arise frequently in applications of ranking models, for instance 
in computational biology |1] and machine learning Our algebraic framework based on 
graded posets Q is well-suited for such contemporary applications of statistical ranking. 

While the Birkhoff model has already received a lot of attention in the literature, we 
here focus on the Csiszdr model (Section 4), the ascending model (Section 5) and the inver- 
sion model (Section 6). For each of these toric varieties, we characterize the corresponding 
lattice polytope and its Markov bases, that is, binomials that generate the toric ideal. 

Section 7 is concerned with the Plackett-Luce model, which is not a toric model, but 
is parametrized by certain conditional probabilities that are not monomials. In algebraic 
geometry language, this model is obtained by blowing up the projective space P"~i along 
a family of linear subspaces of codimension 2, and we study its coordinate ring. We also 
examine marginalizations of our models, including the widely used Bradley-Terry model. 

2. Toric Models: A Sneak Preview 

A toric model for complete permutation data is specified by a non-negative integer matrix 
A with n\ columns that all have the same sum S. These column vectors are indexed 
by permutations tt G (5„ and they represent the sufficient statistics of the model. The 
article [IT] serves as our general reference for toric models in statistics, their relationship 
with exponential families, and the role of the matrix A. For an introduction to algebraic 
statistics in general, and for further reading on toric models, we refer to the books [13| 128]. 

If r = rank(A) then the convex hull of the column vectors A^^ is a lattice polytope of 
dimension r — 1. We refer to it as the model polytope. The toric model can be identified with 
the non-negative points on the projective toric variety associated with the model polytope. 
Each data set is summarized as a function u : (5„ i— )■ N, where u{tt) is the number of times 
the permutation vr has been observed. Thinking of m as a column vector, we can form the 
matrix-vector product Au, whose entries are the sufficient statistics of the data u. Then the 
sum n! • 5* of the entries in the vector Au coincides with the sample size N = Yln£6„ '^i'^)- 

In subsequent sections we will generalize to the situation where ©„ is replaced by a proper 
subset, in which case A has fewer than n\ columns, but still labeled by permutations. These 
will be the linear extensions of a given partial order on [n] = {1, 2, . . . , n}. In fact, for some 
models we can even take the set of maximal chains in an arbitrary ranked poset. But for a 
first look we confine ourselves to the situation described above, where A has n\ columns. 

We now define four toric models for probability distributions on We do this by way 
of a verbal description of the sufficient statistics in each model. These sufficient statistics 
are numerical functions on the permutations tt of the given set [n] of items to be ranked. 

(a) In the ascending model, the sufficient statistics Au record, for each subset / C [n], 
the number of samples vr in the data u that have the set / at the bottom. Here, the 
set I being at the bottom means that (i & I and j ^ J) implies 7r(i) < vr(j). 
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(b) In the Csiszdr model, the sufficient statistics Au count, for each i E I C [n], the 
number of samples that have / at the bottom but with i as winner in the group /. 
This is the model studied by Villo Csiszar fO', TO] under the name "L-decomposable". 

(c) In the Birkhoff model of | il2^ §6.1], the sufficient statistics Au of a data set u record, 
for each i, j e [n], the number of samples tt in which object i is ranked in place j, 

(d) In the inversion model, the sufficient statistics Au count, for each ordered pair 
i < j in [n], the number of samples vr in which that pair is an inversion, meaning 
T^~^{i) > 7r~^(j). This model can be seen as a multivariate version of the Mallows 
model [25J. 

To illustrate the differences between these models let us consider the simplest case n = 3. 
In each case the toric ideal of the model is the kernel of a square- free monomial map from the 
polynomial ring ^[^123,^132,^2135^2315^3125^321] representing the probabilities to another 
polynomial ring K[a, b, . . .] that represents the model parameters. The model polytope is 
the convex hull of the six 0-1 vectors corresponding to the square-free monomials: 

P123 Pl32 P213 P23I P312 P321 

Birkhoff 011022033 011023032 O12O21O33 O12O23O31 O13O21O32 O13O22O31 
inversion 6i2fei3&23 &i2&i3g23 gi2&i3&23 quqishs &i2gi3g23 gi2gi3g23 
ascending C1C12C123 C1C13C123 C2C12C123 C2C23C123 C3C13C123 C3C23C123 

Csiszar rf|l(il|2'^12|3 '^|1'^1|3C^13|2 '^|2'^2|1'^12|3 '^|2C?2|3'^23|1 '^|3'^3|l'^13|2 C?|3C?3|2C^23|1 

The toric ideals record the algebraic relations among these square-free monomials: 



>i32P23i - P123P3215 P213P312 - P123P321 ) has codimension 2, 



' birk ^ asc 



;Pl23P23lP312 - P132P213P32I ) 

(0) 



has codimension 1, 
has codimension 0. 



For each model, the matrix A has six columns, indexed by ©3, and its rows are labeled 
by the model parameters. For example, for the ascending model, the matrix has seven rows: 
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Here we use the same notation for both the matrix and the model polytope, which is the 



convex hull of the columns. From the equality of ideals, Jbirk 



we infer that the poly- 



tope As3 is affinely isomorphic to the 3x3-Birkhoff polytope, which is a cyclic 4-polytope 
with six vertices. The ideal I^v reveals that the model polytope for the inversion model is 
a regular octahedron, while the polytope for the Csiszar model is the full 5-simplex. 
To see that no two of our four models agree, we need to go to n > 4. 
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Example 2.1. Let n = A. Then all four model polytopes have 24 vertices but their dimen- 
sions are different. The Birkhoff model has dimension 9, the inversion model has dimension 



6, the ascending model has dimension 11, and the Csiszar model has dimension 17. Theo- 
rem 3.1 will explain the precise relationships and inclusions among the four models. □ 



Our work on this project started by trying to understand a certain model whose toric 
closure is the ascending model. Here toric closure refers to the smallest toric model contain- 
ing a given model. That non-toric model for ranking is the Plackett-Luce model [8| [MII29] . 
It can be obtained from the ascending model by the following specialization of parameters: 



Ci 



Oi + 9j 



Oi + Oi + Ok 



The prime ideal of algebraic relations among the is a non-toric ideal which contains the 
toric ideal /asc- The case n = 3 is worked out explicitly in Example 7.1 Geometrically, that 
smallest Plackett-Luce model corresponds to blowing up at the nine points in (19) 



3. Toric Models: Definitions and General Results 

Let Q be a poset on finite ground set Q. A Q-ranking is a maximal chain < ■ ■ ■ < fln in 
Q. A chain Qq < ■ ■ ■ < being maximal means that is minimal in Q, an is maximal, and 
tti < ttj+i is a cover relation for < i < — 1. We write M{Q) for the set of maximal chains 
in Q and Cov(Q) for the set of cover relations in Q. If Q = 2'"! is the Boolean lattice of all 
subsets of [n] ordered by inclusion then the maximal chains in Q are in bijection with the 
permutations in ©„, and the models below coincide with the ones described in Section 2. 

We shall define four toric models whose states are the maximal chains tt G M{Q). The 
probability of tt is represented by an indeterminate p^^. Each toric model for Q-rankings is 
defined by a non-negative integer matrix A whose columns are indexed by M{Q) and have 
a fixed coordinate sum S. The matrix A represents a monomial map from the polynomial 
ring K[p] in the unknowns p.^, vr G M{Q), to a suitably chosen second polynomial ring. 

Any data set gives a function u : M{Q) i— )■ N, where M(7r) is the number of times the 
permutation vr has been observed. Thinking of m as a column vector, we can form the 
matrix-vector product Au, whose entries are the sufficient statistics of the data set u. The 
coordinate sum of the vector Au is equal to S times the sample size = X]7reM(Q) '^(^)- 

(a) In the ascending model, the sufficient statistic Au records, for any given poset 
element a & Q, the number of observed maximal chains tt that pass though a. The 
model parameters are represented by unknowns Ca, and the monomial map is 

Ptt H- CaoCai " " " Ca„ for TT = (ao<ai< ■ ■ • <an) ■ 

(b) In the Csiszar model, the sufficient statistic Au records, for any cover a < b, the 
number of observed maximal chains vr passing though a and b. The model parame- 
ters are represented by unknowns da<:b for {a<b) G Cov(Q). The monomial map is 

Ptt ^ dao<aida^<a2 " " " da„_,<a,^ for TT = (ao<ai< ■ ■ ■ <a„). 
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If Q = 2["], the Boolean lattice of subsets a C [n], then the maximal chains vr in Q 
are identified with permutations in (3„, and we recover the ascending model as defined in 
Section 2. Likewise we recover the Csiszar model on ©„ by setting da<:b = da\i for {i} = b\a. 

The Birkhoff and inversion model cannot be formulated in the above generality. For these 
we need assume that the poset Q is a distributive lattice. This means that Q = 0{V) is 
the poset of order ideals in a given partial order V on [n]. We refer to V as the constraint 
poset. The constraint i<j stipulates that item i must always be ranked before item j. The 
maximal chains n in Q = 0{V) are the permutations of [n] that respect all constraints in 
V. See [4j for an introduction to distributive lattices in a context of statistical interest. 

The compatible permutations vr are known as linear extensions of V. From now on we 
abbreviate C{V) = M{0{V)), and we identify elements of C{V) with permutations tt G (5„ 
that represent linear extensions of V. This allows us to define our other two toric models: 

(c) In the Birkhoff model, the sufficient statistic Au records, for all i,j G [n], the 
number of samples it G C{V) for which object j is ranked in position i. The model 
parameters are represented by unknowns ajj for i,j G [n]. The monomial map is 

n) 

for TT G C(V). 

(d) In the inversion model, the sufficient statistics Au records, for each ordered pair i,j 
in [n], the number of samples vr G C{V) for which i < j but vr~^(i) > 7r^^(j). The 
model parameters are represented by unknowns Uij and Vij. The monomial map is 



^ n n n e c{v). 



l<i<j<n l<i<j<n 
^l(i)<T-l{i) 7r-l(i)>7r-l(j) 



In general, we have the following inclusions among the four toric models (a)-(d). These 
inclusions of toric varieties correspond to linear projections among the model polytopes. 

Theorem 3.1. (i) The ascending model and the Csiszar model on a poset Q satisfy 

csi) 

provided Q has either a unique minimal element 6 or a unique maximal element 1. 

(ii) If Q = 0{V) is a distributive lattice, then the Birkhoff model A^birk; the inver- 
sion model Aiinv, the ascending model Ai^sc o-nd the Csiszar model A^csi satisfy 

Minv C A^csi and Muvk C M^sc C A^csi- 

(iii) The inclusions (ii) are strict in general. Moreover, if n > A and Q = 2^"^ then 

Minv t -Masc and A^birk ^ A^inv 

Proof. We begin by establishing (iii). The fact that the inclusions in (ii) are strict follows 



from Example 2.1 For the second part of (iii) consider n = 4. A direct computation as in 
Section 6 reveals that the inversion model A^mv is a projective toric variety of dimension 
6 and degree 180 in P^^. The Markov basis of li^v consists of 81 quadrics. Since Albirk has 
dimension 9, we conclude that Albirk ^ Alinv An explicit point p in Albirk\Alinv is the 
uniform distribution on the nine derangements. This arises by setting an = for all i and 
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ttij = 1/a/3 for all i ^ j- The quadric P1243P4321 — P2143P4312 G -^inv does not vanish for this 
particular distribution. 

The ascending model A^asc has dimension 11 and degree 808. The Markov basis of its 
toric ideal /asc consists of six quadrics, 64 cubics and 93 quartics. One of the cubics is 



'asc- 



(1) Pl234Pl342Pl423 " Pl243Pl324Pl432 £ h 

An example of a point in A^inv\A^asc is obtained by taking the parameter values 

U12 = Uis = Uu = 0, U23 = U24 = M34 = Vu = fl3 = V23 = V24 = 1, ^^34 = 2, Vi4 = 1/9. 



The resulting distribution is supported on the six permutations in (1) Its coordinates are 



P1234 = P1342 = P1423 = 2/9 and P1243 = P1324 = Pu32 = 1/9. 



This distribution is not a zero of (1), and hence it is not in the ascending model A^asc- 
The two probability distributions on permutations seen above can be lifted to similar 

counterexamples for n > 5, and we conclude that the non-inclusions are valid for all n> 4. 
The inclusion A^asc C A^csi in (i) is seen by the specialization of parameters that sends 

da<b to Ca if Q has a unique maximal element 1 and to Cb if Q has a unique minimum 0. 
We lastly prove the inclusions in (ii). The parameters for the Csiszar model A^csi are da<:b 

where a <h E Cov(Q). If M(Q) = C{V) then the cover relation a < b means b = aU {j}. 

Thus the following specialization of parameters gives the parameterization of Aiinv'- 

da<b ^ Yi n '"'r 

iS:a,i<j i(ia,i>j 

This shows that the inversion model A^inv is a subvariety of the Csiszar model A^csi- 

It remains to show that A^birk C A^asc- To do this, we let A denote the model matrix for 
A^birk and B the model matrix for Alasc- Both matrices have their entries in {0, 1} and 
they have columns. The rows Aij of A are indexed by unordered pairs i,j G [n] x [n], 

and the rows Bj of B are indexed by subsets of [n]. We have the identity 

A, = T.{Bi ■■ I e and t e 1} - Z{Bi : / G and z G /}. 

This shows that every row of A is a Z-linear combination of the rows of B. Hence, the kernel 
of A contains the kernel of B, and this implies that the toric ideal I a = Ibhk contains the 
toric ideal Ib = /asc- We conclude that Albirk is a submodel of Alasc- D 

In the rest of this paper we consider the ascending and Csiszar models only in the graded 
situation, that is, when the monomial images of all the unknowns Pc, c G M{Q), have the 
same total degree. The latter is equivalent to requiring that all maximal chains in Q have 
the same cardinality, which in turn is equivalent to Q being graded. For a graded poset Q 
we denote by rk : Q — ?■ N its rank function and write Qi for the set of its elements of rank 
i. By Tk{Q) we denote the rank of Q, which is the maximal rank of any of its elements. 

In the next three sections we undertake a detailed study of the models (b), (a) and 
(d), in this order. The Birkhoff model (c) has already received considerable attention in 
the literature [TT| [T2], at least for C(V) = (5„, and we content ourselves with a few brief 
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remarks. Its model polytope, the Birkhoff polytope of doubly stochastic matrices, is a key 
player in combinatorial optimization, and it is linked to many fields of pure mathematics. 

The restriction of the Birkhoff model and its polytope to proper subsets £.{V) of 
has been studied only in some special cases. For example, Chan, Robbins and Yuen j7| 
considered this polytope for the constraint poset V given by the transitive closure of j > 
j — 2 and j > j — 3 for 3 < j < n. They stated a conjecture on its volume which was proved 
by Zeilberger [31]. We close by noting a formula for the dimension of these polytopes. 

Proposition 3.2. Let V he an arbitrary constraint poset on [n] = {1, 2, . . . ,n}. Set 

Z = { (z, j) G [n] X [n] \ 7r(z) ^ j for all tt G C{V) } 

G Z for some j' > j or 



and C = < {i,j) G [n] x [n] \ {i,j) ^ Z and 

{ [t ,J) G Z for some v > i 

The model polytope Bi of the Birkhoff model, expressed using coordinates Xij on M"^", 
equals the face of the classical Birkhoff polytope of bistochastic nxn-matrices defined by 

(2) Xij = for all e Z. 

In particular, the dimension of the Birkhoff model polytope is dim(Bi) = n"^ — \Z\ — \C\. 

Proof. Clearly, the model polytope Bi of the Birkhoff model is contained in the classical 
Birkhoff polytope. Equally obvious is that all equations |(2) are valid for the model polytope 



Hence Bi is contained in the polytope cut out from the classical Birkhoff polytope by [(2 



Following the lines of the Birkhoff-von Neumann Theorem (see e.g. [1, (5.2)]), we note 



that the vertices of the polytope cut out by (2) from the classical Birkhoff polytope are 



the permutation matrices for the permutations tt G jC{V). The first assertion now follows. 

The linear relations on the Birkhoff polytope state that all row and column sums are 1. 
We set Xij = for {i,j) G Z. In the resulting linear relations precisely the variables Xij 
for G C are the leading terms with respect to order of the variables induced by the 

lexicographic order on the index tuples. This proves the dimension statement. □ 

We illustrate [Proposition 3.2| with two simple examples. If V is an n-element antichain 
then Z = and C = {(1, n) , {2, n) , . . . , {n, n) , [n, n — 1) , . . . (n, 1)}. Here our formula gives 
the dimension — — {2n — 1) = (n — 1)^ of the classical Birkhoff polytope. If V is the 
n-chain 1<2< ■ • • <n then Z = {{i,j) G [n] x [n] \ i ^ j} and C = {{i,i)\i G [n]}. Here the 
model polytope is just one point, since dim(Bi) = — |Z| — |C| = — n{n — 1) — n = 0. 

4. The Csiszar model 

The Csiszar model for the Boolean lattice Q = 2["1 was studied by Villo Csiszar in 
[9l [To]. She calls it the L- decomposable model where the letter "L" refers to Luce |24) . 
Indeed, the model can be seen as the generic model satisfying Luce-decomposability (see 
[25j). We prefer to call it the Csiszar model., to credit her work for introducing this model 
into algebraic statistics. We note that the Csiszar model for Q = 2["] also appears in work 
on multiple testing procedures by Hommel et. al. |20], but with a different coordinatization 
of its model polytope. Throughout this section, we fix a graded poset Q of positive rank. 
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We begin by defining a 0-1-matrix A = Ci that represents the Csiszar modeL Our 
construction is based on the technique employed for Q = 2'"! in Csiszar's proof of [HI 
Theorem 1]. The columns of Ci are indexed by the unknown probabilities p.^ where vr G 
M{Q), and the rows of Ci are indexed by the model parameters da<b where {a<b) G Cov{Q). 
We write MinCov(Q) for the set of cover relations a < 6 for some element a & Qq oi rank 0. 

Consider the discrete undirected graphical model [T3| [T7] given by the n-chain graph 
G = {[n\,E) with edge set E = {{i,i + 1} | 1 < i < n — 1}. We take as the states 
of node i the set Qi of all elements of rank i in Q. The n-chain graph G is chordal (or 
decomposable), so the five equivalent conditions of [ITj Theorem 4.4] hold for G. Let Ac- 
denote the associated model matrix [T71 §2.2]. It has YYi=o \Qi\ columns indexed by tuples 
(ao,...,an) of elements G Qi and Yl^=o \Qi\ ' IQi+il rows indexed by pairs (a, 6) of 
elements of Q from consecutive ranks. Its entries are or 1 according to the pattern for an 
undirected graphical model. More precisely, in row (a, b) all columns are except for the 
rows indexed by tuples containing a and b. We shall use the following key facts from [T7] 
Theorem 4.4]: the image of the monomial map given by Aq is closed, and the cone spanned 
by the columns of Aq contains all non- negative vectors in the column space of Aq. 

As in Csiszar's proof of [9, Theorem 1], we focus on the submatrix A'q of Aq whose 
column labels (oq, . . . , cin) correspond to maximal chains Oq < ■ ■ ■ < fln from M{Q). Many 
of the rows of A'q are entirely zero, namely, all those rows indexed by pairs (a, 6), where 
a is not covered by b in Q. Let Aq denote the matrix obtained from A'q by deleting all 
such zero rows. The remaining rows are indexed by pairs (a, b) & Qi x Qi+i for some i. 
Equivalently, the rows of Aq are indexed by Cov{Q). This shows that the toric model Aq 
is precisely our Csiszar model, and, with this identification of coordinates, our polytope Ci 
coincides with the convex hull of the columns of Aq. Now we are in a position to give a 
description of the model polytope Ci in terms of linear equalities and inequalities. 

Theorem 4.1. Let Q be a graded poset of rank > 1 and Ci C ]RCov{Q) model polytope 
of its Csiszar model, with coordinates Xa<b indexed by cover relations a < b in Cov{Q). 
Then Ci is of dimension |Cov(Q)| — |Q| + \Qn\ + IQol ~ 1- Inside the orthant defined by 



where Va is the set of b that cover a, and Aa is the set of b that are covered by a. 

Proof. Let G = {[n], E) with E = {{i,i + 1} | 1 < i < n — 1} he the n-chain and Aq the 
defining matrix of its graphical model as discussed above. Also let A'q and A'q be as above. 

The n-chain graph G is decomposable, so the five equivalent conditions in [ITj Theorem 
4.4] are true. The fifth condition, that the exponential family is closed in the probability 



(3) Xa<b > for {a<b)e Cov{Q), 

the polytope Ci is the solution set of the inhomogeneous linear equation 




a<6GMinCov(Q) 

together with the system of linear homogeneous equations 




beVa 6eAa 
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simplex, is equivalent to the statement that the model polytope of that n-chain model is 
defined by linear equations and non-negativity constraints only. See for a toric algebra 
perspective. We have shown that the toric model of Aq is our Csiszar model. With this 
identification, the model polytope Ci coincides with the convex hull of the columns of Aq. 

The matrix Aq was constructed so that its columns are precisely the points on a face 
of the model polytope for Aq- Hence the model polytope of the Csiszar model is obtained 
from the earlier polytope by simply setting some of the non-negative coordinates to zero. 
This implies that Ci inherits all the desirable properties spelled out in Theorem 4.4 of |17| . 
In particular, its exponential family is closed, and the polytope Ci coincides with the set 
of all non-negative points in the affine space spanned by the columns of the matrix Aq. 

At this stage we only need to show that the affine span of the columns of Aq equals the 
solution space of (4) and |(5) The equation |(4) holds for a vertex of the model polytope 



because any maximal chain contains exactly one cover relation involving an element of 
rank and an element of rank 1. The equations (5) [ hold for a vertex of the model polytope 



because, given any element a G Q, a maximal chain either contains no cover relation 
involving a or exactly two, one of the form b < a and one of the form a < h' . Hence each 
column of Aq satisfies (4) and |(5)[ Conversely, any 0-1-solution of these equations must 



come from a maximal chain in Q, and hence is among the columns of Aq. □ 

Remark 4.2. The maximal likelihood estimator p for the Csiszar model is a rational func- 
tion of the sufficient statistics b. Indeed, as for any toric model [28l Theorem 1.10], the 
MLE is the unique non-negative real solution of the linear equations Aq • p = b where 
p G V{Ics\)- An explicit formula for p as a rational function in b is obtained from the cor- 
responding formula for the n-chain model Aq by setting the redundant sufficient statistics 
to zero. This specialization works because the Csiszar model is a face of the ra-chain model, 
and all formulas are compatible with our transition from Aq to Aq via A'q. On the other 
hand, the same idea of computing the MLE rationally by restriction no longer works for 
our submodels of the Csiszar model, such as the Birkhoff model or the ascending model. 
For instance, for n = 3, the matrix Aq is invertible and p = (A'(^)^^b, while the MLE for 
-^birk = -^asc rcquircs Cardono's formula: we must solve a cubic equation to get the MLE. □ 

The toric ideal Jcsi of the Csiszar model is the kernel of the ring homomorphism 

<a2 ' ' ' da^_i<a„ TT — (ao<Cti< ■ ■ ■ <(ln) ■ 

The minimal generators of Jcsi form the Markov basis of A^csi- As shown in the proof 
of Theorem 4.1, the Csiszar model polytope Ci = Aq inherits the equivalent conditions 



(b),(c),(d),(e) in [TT] Theorem 4.4] from the larger model Aq. In particular, the toric ideal 
/csi has a Grobner basis consisting of quadratic binomials. We shall now describe this 
Grobner basis explicitly. It generalizes the Markov basis for Q = 2'"] in [9| Theorem 3.1]. 

Theorem 4.3. A Grobner basis for the toric ideal I^si of the Csiszar model on a graded 
poset Q is given by all quadratic binomials of the form 

(6) 

where the chains tti and tt[ have the same ending point and both tt2 and VTg start there. 
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Proof. It is easy to check that the binomial quadrics that he in the ideal Jcsi are precisely 



the quadrics (6) These are inherited from the conditional independence statements valid 
for the n-chain graphical model G. These statements translate into a quadratic Grobner 
basis for the toric ideal of the matrix Ac- The leading terms of that Grobner basis are 
squarefree, so by [STl Corollary 8.9] they define a regular unimodular triangulation of the 
convex hull of the columns of Aq- Since Ci = Aq is a face of that polytope, that face inherits 
the regular unimodular triangulation from Aq- We conclude that the Grobner basis which 



specifies this regular triangulation of Ci consists precisely of the quadrics (6) □ 



The Grobner basis (6) reveals that the Csiszar model has desirable algebraic properties: 



Corollary 4.4. The coordinate ring K[p]//csi of the Csiszar model over any field K is 
Cohen- Macaulay and Koszul. Its Krull dimension equals |Cov(Q)| — |Q| + \Qn\ + IQol- 



Proof. Since I^si has a quadratic Grobner basis, by Theorem 4.3, it follows that K[p]//, 



is Koszul. Again by [Theorem 4.3 there is a squarefree initial ideal of Jcsi. Hence by 
Proposition 13.15] the polytope the semigroup algebra K[p]/Jcsi is normal, and hence 
Cohen-Macaulay, by Hochster's Theorem fT9l, Theorem 1]. The dimension of this semigroup 



algebra is one more than the dimension of its polytope, given in [Theorem 4.1 □ 



For computations it is convenient to represent the quadrics in (6) as the 2x2-minors of 
certain natural matrices Mq that are indexed by the elements q of the poset Q. The row 
labels of the matrix Mq are the maximal chains in the order ideal Q<q = {a E Q \ a < q} 
and the column labels of Mq are the maximal chains in the filter Q>q = {b E Q : q < b}. 
Thus Mq is a matrix of format |M(Q<q)| x |M(Q>q)|. We define Mq as follows. The entry 
of Mq in the row labeled tti G M(Q<g) and the column labeled G Q>q is the unknown 
Ptt where vr denotes the maximal chain of Q that is obtained by concatenating tti and 772. 

Corollary 4.5. The Markov basis of the Csiszar ideal I^si consists of the 2x2-minors of 
the matrices Mq, where q runs over Q. This Markov basis is also a Crobner basis. 



Proof. Each 2 x 2-minor of Mq has the form required in (6) , and, conversely, each binomial 
in |(6)| occurs as a 2 x 2-minor of Mq for some q. Note that this element g G Q is generally 



not unique for a given binomial. The Grobner basis statement is a part of Theorem 4.3 □ 



We illustrate our results for the case when Q = 2^"^ is the Boolean lattice, with n < Q. 
For n = 3, the ideal Jcsi is zero as seen in Section 2. For n = 4, the ideal /csi is the complete 
intersection of six quadrics, namely, the determinants of the six 2x2-matrices M^ijj. Geo- 
metrically, these correspond to the six square faces of the 3-dimensional permutahedron: 

-^csi = ( P1243P2134 ~ P1234P21435 P1342P3124 ~ P1324P3142) ^1432^4123 ~ ^1423^4132, 

P2341P3214 ~ P2314P324I5 P2431P4213 ~ P2413P4231? P3421P4312 ~ P3412P432I )• 



We conclude that the Csiszar model for n = 4 has dimension 17, as predicted by The 



orem 4.1 As a projective variety, this model has degree 32 since it is a complete inter- 



section. For n = 5, the Markov basis consists of the 2x2-minors of the ten 2x6-matrices 
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M|i^2}, ^{1,3}, • • • 5 ^{4,5} and ten 6x2-matrices M{i_2,3}, M{i^2,4}5 • • • , ^{3,4,5}- For example, 

M|2,4} = 



/ P24135 P24153 P24315 ^24351 ^24513 P24531 \ 
\P42135 P42153 P42315 ^42351 ^42513 P42531 / 



Altogether, these matrices have 300 maximal minors but 30 of the minors occur in two 
matrices, so the total number of distinct Markov basis elements is 270. The dimension of 
this model is 49, and its degree equals 50493797160. The Hilbert series oiK.[p]/Icsi equals 

(1 + rot + 22lbf + 42020*3 + 534635t^ + 4837694t^ + 32227985t^ + 161529320t^ 
+617560160t^ + 1816401720t^ + 4129171068t^° + 7265606880*^^ + 9880962560*^^ 
+10337876480*13 + 8250364160*^'' + 4953798656*^5 + 2189864960*^6 
+688455680*1"^ + 145162240*1^ + 18350080*1^ + 1048576*20)/(1 - tf^. 

For n = 6, the Markov basis is represented by the fifteen 2 x 24-matrices M{j j}, the 
twenty 6x6- matrices M^ij^j^y and the fifteen 24 x 2-matrices M^i j ^^iy. Altogether, these 50 
matrices have 12780 minors of size 2x2 but only 10980 of the binomial quadrics are distinct. 

A systematic way of understanding our matrices Mg is furnished by Sullivant's theory 
of toric fiber products [32j. This method will become crucial when studying the ascend- 
ing model in the next section and we will explain at the end of the section how toric fiber 



product can also be used to give an alternative proof of Theorem 4.3 



5. The ascending model 

At the end of [HI p. 233] it is asserted that a Markov basis for the ascending model on 
Q = 2^"! can be obtained in a similar way as was done for the standard Csiszar model, but 
no details are given. However, simple examples show that it does not suffice to consider 
quadratic binomials for the generating set and it is not clear from [9j which properties the 
defining ideals of the ascending and Csiszar model have in common. The defining ideal and 
the model polytope of the ascending model seem to be complicated and more interesting 
than those of the Csiszar model. These are the structures to be explored in this section. 

Generalizing the notation introduced in the preceding section, for any subset A ^ Q, we 
consider the set of elements of A that cover an element from A: 

VA := {b e Q \ a <b e Cov{Q) for some a G A}. 

We also consider the set of elements covered by an element from A: 

AA := {b e Q \ b < a e Coy{Q) for some a G A}. 

Theorem 5.1. Fix a graded poset Q of rank n. The model polytope As of the ascending 
model is the set of solutions in the space with coordinates Xa for a G Q, of the equations 

(7) J2xa = I, 0<i<n, 

and the inequalities 



(8) 



Xa > 0, a e Q, 
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(9) 



'^Xa+'^Xa > 0, ACQi,0<i<n-l. 



Proof. Equations |(7)| are valid on every vertex of As because every maximal chain in P has 
exactly one element of rank i for all < i < n. The inequalities (9)| express the fact that 
if a maximal chain passes through an element oi A O then it must also pass through a 
unique element of VA. Inequalities (8) are obviously valid for As. Hence As is contained in 
the intersection of the linear spaces defined by (7) and the halfspaces defined by (8) and |(9)[ 
For the converse we proceed by induction on n. If n = then As is a simplex of dimension 
IQI — 1, defined by (7) and (8)[ If n = 1 then the result is identical t o [26, Co roll ary 1 .8 (b)]. 



8) and (9) Let x' 



Assume n > 2. Let x = {xa)a&Q G M*^ be any vector satisfying (7) 
be the projection of x onto the coordinates in Q' = Qo U ■ ■ ■ U Qn-i and x" the projection 
of X onto Q" = Qn-i U Qn- By induction, x' and x" lie in the model polytopes of the 
ascending model for Q' and Q". Hence we can write x and x' as convex linear combinations: 



and 



\r"C". 



c'eM(Q') ''e^ "i^^ ^ — A^c"eM{Q") 

Here we identify c' and c" with the 0/1- vector that has support c' and c" respectively. 

Consider a fixed element a G Qn-i- Let c'^, . . . , be the chains from the above expansion 
of x' that contain A and for which Ac' > 0. Let c'/, . . . , c" be the chains from the above 
expansion of x" that contain a and for which Ac" > 0. The coordinate x'^ of x' then 
equals ^ Ac' and the coordinate x" of x" equals ^ Ac". Since x'^ and x" coincide with the 
coordinate Xa of x, we have X] -^c' = X^^c"- After relabeling (and possibly swapping x' 
and x") we may assume that Ac'^ is the minimum of {Ac'^, . . . , Ac;, Ac'^^', . . . , Ac'/}. Then we 
replace Ac'^^' by Ac'^^' — Ac'^. Let Ci G M(Q) be the concatenation of c[ and c". Now set A^ = Ac'^ 
and proceed with the new coefficients and the chains Cg, . . . , c', and c", . . . , c". Clearly the 
sums of the coefficients of C2, . . . , c'^. and c", . . . , c" still coincide. Proceeding by induction 
and summing over all a G Qn-i for which x^ > 0, one constructs an expansion ^ AjQ in 
terms of chains in M{Q) whose projection onto M{Q') equals x' and whose projection onto 
M{Q") equals x". Hence x = ^ AjCj, and we have Aj > and Yl^i ~ XlaeQ „i -^a ~ ^ 
(7)[ This proves that x G As. 



□ 



In the preceding proof, when showing that any x satisfying (7) - (9) lies in As, we use (9) 
only in the induction base n = 1. The equations (7) are complete and independent when 
Q = 2["] is the Boolean lattice, so in that case the dimension of the model polytope As is 
equal to 2" — n — L In general the dimension is more subtle to calculate and we do not 
know any good description. For example if the induced subposet of Q on the elements of 
two consecutive ranks i and i + 1 is disconnected then As is contained in each hyperplane 
defined by the equality of the sum over the variables of rank i and i + 1 in a component. 

Now we turn to the toric ideal Jasc of the ascending model. It is the kernel of the map 



[p] K[t], ^ 



' ta,^ for TT = (ao < ■ ■ ■ < an) G M(g). 



(10) 

If Tk{Q) = then this map is injective and Jasc = {0}, so we assume Tk{Q) > 1 from 
now on. The case Tk{Q) = 1 serves as the b for our inductive constructions. Here 

the poset Q is identified with a bipartite graph on Qq and Qi, and the monomial map 
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h-> tao^ai defines the toric ring associated with a bipartite graph in commutative algebra. 
A generating set of the kernel of this map was determined in [271 Lemma 1.1] and shown 
to be a universal Grobner basis in [331 Proposition 8.1.10]. This result has already proven 
to be useful in algebraic statistics (see e.g. |14|). 

Lemma 5.2 (Ohsugi-Hibi [27j, Villerreal [33J). Let Q he a graded poset of rank 1. Then 
a universal Grobner basis of the toric ideal Jasc consists of all cycles in Q, expressed as 
binomials 

s<a2s-l 

where a2s = Oo o-nd the ai are pairwise distinct otherwise. 

Lemma 1.1 in [27] and Proposition 8.1.10 in [33J is actually formulated in a slightly 
different language. For a graph G = {V, E) with vertex set V and edge set E one considers 
two polynomial rings, one where the variables are indexed by the edges of the graph and 
one where the variables are indexed by the vertices. Now the edge variables are mapped to 
the product of the corresponding vertex variables. If the graph is bipartite with bipartition 
V = VqUVi then one can consider it as a graded poset of rank 1 where vertices from Vq 
are covered by their neighbors in Vi. Of course, the role of Vq and Vi can also be reversed. 
Thus the edge variables represent variables indexed by the maximal chains, and the kernel 
of the map to the corresponding product of vertices coincides with the toric ideal Jasc- 

Now we are in a position to describe a Grobner basis for Jasc when rank((5) > 1. 

Theorem 5.3. A Grobner basis for the toric ideal Jasc of the ascending model on a graded 
poset Q of rank n is given by two classes of binomials. The first class consists of the quadrics 

(11) Ptti ■ P-K2 Pni ' P-K2 1 

where tti, vfi, 7r2, 1x2 are distinct chains of at least three elements, such that tti U = vfi U 712 
as multisets and tti fl 7r2 = tti fl 7f2 is nonempty. The second class consists of all binomials 

(12) p^iP^2 ■■■Pns - PifiP*2 ■■■Pns, 

where tti, tti, . . . , vr^, vf^ are constructed as follows: Ghoose i G {0, 1, . . . , n— 1} and take any 
cycle 7 = {ao<ai>a2< ■ ■ ■ <a2m-i>fl2m=flo) in the subposet Qj.i+i of all elements having 
rank i or i + 1 in Q. Then the maximal chains ttj, ifj for < j < s are chosen such that 

= ( '^j,o = ^j,o < ■ ■ ■ < Uj^i = Uj^i = a2j < a2j+i = < ■ ■ ■ < Mj> ) 

and TTj = ( Ujfl = Ujfl < ■ ■ ■ < uj^i = uj^i = a2j < a2j-i = Uj^i+i < ■ ■ ■ < uj^n ) 

and the multisets {uj^i \ < j < s,i < £ < n} and {uj,e | < j < s, i < £ < n} coincide. 



In Figure 1 we give a visual description of the binomial (12) 

For the proof of this result we shall employ Sullivant's theory of toric fiber products 
from [32]. We briefly review that theory. Consider two polynomial rings ]K[p'] and K[]9"] 
and a surjective multigrading : {p'}U{p"} — )■ ^ C M'', called the A-grading. Then choose 
new variables Zj^^r for all vr G {p'} and r G {p"} such that 0(7r) = 0(t). For ideals / in 
K[]9' ] and J in ]K[|}" ] that are ^-homogeneous, we let / x_4 J denote the kernel of the map 
Zt^^t- ^ p'^ ®p'!^ from ]K[z] to the tensor product ]K[j9' ]// (g) K[p" ]/J. 
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Figure 1. A binomial in the Grobner basis of the ascending model 



In order to describe a Grobner basis of J J in terms of Grobner bases of I and J, the 
concept of lifting monomials turns out to be crucial [32, p. 567]. A lift of a variable p'^ is Zj^r 
for some r with 0(7r) = 0(t). Now assume that A is linearly independent. Let / G IK[p'] 
be an ^-homogeneous polynomial. Each monomial m in f factors as m^^ . . . m^^ where 
A = {ai, . . . ,ar} and 4>{ma-) = deg(ma-)aj. Moreover, since A is linearly independent, 
each monomial m in f gives the same number di := deg(ma.) of variables of degree Oj 
(counted with multiplicity). Now choose a multisets of di variables p" of degree Oj. A lift of 
/ is then any polynomial obtained from the above choices when lifting the variables in each 
monomial from / in such a way that for all monomials the chosen multisets are exhausted. 



Proof. We proceed by induction on n = Tank{Q). If n = 1 then (11) describes an empty 
set of binomials and the set in |(12)| coincides with the Grobner basis given in Lemma 5.2 



Now assume n > 2. As in the proof of Theorem 5.1 we split Q into the subposet 
Q' = QoU- ■ - UQn-i consisting of ranks 0, . . . ,n — l and the bipartite poset Q" = Qn-i^Qn 
consisting of ranks n — 1 and n. Assume Qn-i = {^i, ■ ■ ■ , ctr}- Any chain in M(Q') ends 
in an element from Qn^i, and any chain from M{Q") starts in an element from Qn-i- 
We consider the polynomial ring ]K[p'] with variables p'^ for vr G M(Q') and K[p"] with 
variables for tt G M{Q"). Then we grade p'^ by Cj G if vr ends in Oj and p'^ by Cj G W 
if vr begins in Oj. Note that the set of degrees A = {ei, . . . , Cr} is linearly independent. 

We write 1'^^^ for the ideal of the ascending model of Q' and J^g^, for the ideal of the 
ascending model of Q". The toric ideal of interest to us is the fiber product Jasc = ^Lsc^A^asc- 
Since A is linear independent, we can apply [3^ Theorem 12] and the induction hypothesis 
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to prove the claim. Sullivant's result tells us that a Grobner basis of Jasc can be found by 
lifting Grobner bases of the ideals I'^^ and I'^^ and by adding some quadratic relations. 

By induction, 1'^^^^ has a Grobner basis Q' consisting of elements |(11) and (12), We shall 
lift these to binomials in Jasc- Likewise, I"^^ has a Grobner basis Q" consisting of elements 



(12) There are no binomials of type (11) in /"gc because the poset Q" has only rank 1. 
Lifting 1(11)1 Let p^^p^ 



PniPTT2 be a quadric (11) in Q'. Since it is ^-homogeneous, the 



multisets of endpoints of tti,tt2 and 7ri,7r2 coincide. Suppose tti and tti have the same 
endpoint. In the lifting described above we need to distinguish two cases. 

Case 1: tti and 772 end in different endpoints. Then, for any two maximal chains Ti, T2 in 
Q" starting in the endpoints of tti and respectively, the unique lift for these choices is 

(13) 

PniTi ' P-K2T2 PiriTi ' P-K2T2 ^ -^asc- 

Case 2: tti and 772 end in the same endpoint. Then, for any two chains ri, T2 in Q" starting 
in the common endpoint of tti and 7r2, besides the lift |(13)| we also have the lift 

(14) 

PniTi ' P-K2T2 Pi^iT2 ' Pi^2Ti ^ -^asc- 



One easily checks that the binomials from (13) and (14) satisfy the conditions from (11 



Lifting (12) First consider a binomial p^r^ ■ ■ -p^^ — p^^ ■ ■ ■ p^^ of type (12) in the Grobner 
basis Q' . Since it is ^-homogeneous, the multisets {(^(tti), . . . , 0(7rs)} and {0(7fi), . . . , 0(7fs)} 
coincide. Now choose maximal chains vr", . . . , vr" from Q" with the same multiset of A- 
degrees {0(7r^'), . . . , 0(7?^')}. Note that the vr" are just single cover relations. For any 7 G 



such that 



(vr"^.^), the binomial 



t(1) " 7(s) 



lies in Jasc and is of type (12 



We next consider a binomial p^ri " " 'Ptts ~Pni " " 'Pns of type (12) in the Grobner basis Q". 
The proof is analogous to the previous case, but the multiset of ^-degree {0(7ri), . . . , (f){TTs)} 
= {0(7fi), . . . , (/((tTs)} here is actually a set. Choosing a set {vr^, . . . , tt^} of maximal chains 
from Q' for which {0(7ri), . . . , (f){TTs)} and {</)(7r'^), . . . , 0(7rg)} coincide leads to a unique lift 



IS 



among the binomials described in 



of type (12) All the binomials constructed by these liftings from Q' and Q" are 

and (12) for the ideal Jasc we seek to generate. 

for all maximal chains 



11 



Finally, we add the quadratic binomials p^^' 7r"P7r' tt" — P-k' n"Pn' n' 
7^[,7r2 G M(Q') and tt'/, VTg € M{Q") whose ^-degrees coincide. These binomials lie in Jasc 
and they have type |(11) 

We have shown that the lifting of the Grobner bases Q' for I!^^^ and Q" for I"^^ plus the 



additional quadrics are a subset of the binomials described in (11) and (12) Using 



Theorem 12], we conclude that the binomials from (11) and (12) form a Grobner basis of 
/asc- Actually, the following converse is true as well: all binomials (11) and (12) in Jasc arise 
from Jasc and Jasc using the lifting procedure we described. □ 



Corollary 5.4. The toric algebra K[p]/Jasc is normal and Cohen- Macaulay. 
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Proof. Theorem 5.3| gave a Grobner basis for Jasc whose leading monomials are squarefree. 
This shows that K[p]//asc is normal. Hochster's Theorem [T^ Theorem 1] implies Cohen- 
Macaulayness. □ 



We could also give an alternative proof of [Theorem 4.3| using toric fiber products. Namely, 
the toric algebra K[p]//csi can be obtained as an iterated toric fiber product of suitably 
graded smaller polynomial rings that are attached to the pieces in a decomposition of Q 



into antichains. The matrices M„ introduced after the proof of Theorem 4.3 represent the 



"glueing quadrics" used for constructing larger toric ideals from smaller ones. 

We close with some brief remarks on the ascending model for the Boolean lattice Q = 2^"'^ 
In Section 2 we saw that, for n = 3, the ideal Jasc is principal with generator P123P231P312 ~ 
P132P213P321- This cubic is of type 



12) It represents the unique cycle in the hexagon Q 



1,2- 



For n = 4, the minimal Markov basis of the ascending model consists of 6 quadrics, 64 



11) and (12) 



cubics and 93 quartics. Thus, here we encounter binomials of both types 
The Hilbert series of the Cohen-Macaulay ring K[p]/4sc for Q = 2W is found to be 

1 + 12t + 72^2 + 228^3 + 291t^ + 168t^ + 36f 
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6. The inversion Model 

The inversion model is defined only in the case when Q is the distributive lattice as- 
sociated with a constraint poset V on [n]. The maximal chains in Q correspond to linear 
extensions tt G C{V) of the constraint poset. These are the permutations tt G &n that 
are compatible with V. Fix unknowns Uij and Vij for 1 < i < j < n. Algebraically, the 
inversion model is defined by the toric ideal which is the kernel of the monomial map 

l<i<j<n l<i<j<n 
TT — l{i)<7r — IT — l{i)>7r-l(j) 

We begin considering the unconstrained inversion model. By this we mean the case when 
V is an ra-element antichain, so there are no constraints at all. In that unconstrained case, 
we have Q = 2["1 and our state space M(Q) = = C{V) consists of all n\ permutations. 

The Mallows model |25) is a natural specialization of the unconstrained inversion model 
to a single parameter q. It is obtained by setting Uij := 1 and Vij := q. So, in this model, 
the probability of observing the permutation vr is P(n) = Z-igl'^^^WI, where 

inv(7r) = {{i,j) : I < i < j < n, 7r"^(i) > 7r"^(j)} 

is the set of inversions of vr, and Z is a normalizing constant. In contrast, our inversion 
model permits different parameters for the various inversions occurring in a permutation. 

The model polytope for the unconstrained inversion model is a familiar object in combi- 
natorial optimization, where it is known as the linear ordering polytope [15, ,18j. It is known 
that optimizing a general linear function over the linear ordering polytope is an NP-hard 
problem [18]. This mirrors the fact that the facial structure of this polytope is very com- 
plicated and a complete description appears out of reach. As a result of this, we expect the 
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toric rings associated with the inversion models to be more comphcated than those studied 
in the previous two sections. Our study was hmited to finding some computational results. 

Theorem 6.1. For n < 6 the toric ring of the unconstrained inversion model is normal 
and hence Cohen- Macaulay. For n < 5 it is Gorenstein and its Markov basis consists of 
quadrics. Forn = Q it is not Gorenstein and there exists a Markov basis element of degree 3. 

Proof. Computations using 4ti2 |16| show that the Markov basis for n = 3,4,5 consists 
of 2,81,3029 quadratic binomials. We do not know whether there is a quadratic Grobner 
basis for = 5, or whether the ring is Koszul. The Hilbert series for n < 5 are 

n Hilbert Series 

^ (l + 2t + t^)/(l-t)4 

4 (1 + m + 72^2 + 72^3 + + t^)/(i - ty 

5 (l+109t+2966t2+22958t=^+61026t^+61026t5+22958t6+2966t^+109ts+t9)/(l - tf^ 

All three numerator polynomials are symmetric. Using normal iz |5] one checks that the 
toric ring is normal in each case. Hochster's Theorem (TH] implies that it is Cohen-Macaulay. 
The Gorenstein property now follows from the general result that any Cohen Macaulay 
domain whose Hilbert series has a symmetric numerator polynomial is Gorenstein. 

For n = 6, the computations are much harder, and they reveal that the above nice prop- 
erties no longer hold. The software also found that the Hilbert series of this unconstrained 
inversion model is the product of 1/(1 — t)^^ and the remarkable numerator polynomial 

1 + 704t + 117783 + 5125328^3 + 76415229 
+475189840 + 1372165343 + 1943081264 + 1372165343 + 475189840 
+76416069 + 5127008 + 118623 + 704 1^^ + t^l 

This polynomial is close to symmetric but not symmetric, so the ring is not Gorenstein. 
In addition to 130377 quadrics, a Markov basis for n = 6 must contain the cubic binomial 

(15) Pl23456Pl23645P416253 ~ Pl23465Pl62345P412536- 

Indeed, a computation shows that these are only two cubic monomials in the fiber given 
by the multiset of inversions {(1,4), (2,4), (2,6), (3,4), (3,5), (3,6), (4,6), (5,6), (5,6)}. □ 

A complete description of the binomial quadrics in a Markov basis was recently found by 
Katthan j23]. However, the problem of characterizing a full Markov basis is widely open. 

We do not know whether normality holds for n > 7, but we suspect not. To address 
this question, we return to the general situation of an underlying constraint poset V. The 
states TT of the V-constrained inversion model are elements of the subset C{V) C ©„. This 
inclusion corresponds to passing to some coordinate hyperplanes in the ambient space of 
the model polytopes. Therefore, the model polytope for the P-constrained model is a face 
of the model polytope for the unconstrained model. Hence, to answer our question about 
normality for n > 7, it could suffice to show that the toric ring for V is not normal. 

At present our state of knowledge about the P-constrained inversion models is rather 
limited. We do not yet even have useful formula for the dimension of its model polytope. 
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By contrast, the dimension of the unconstrained model equals (2); ^is this is the dimension 
of the linear ordering polytope. This was shown, for example, in [30| Proposition 3.10]. 

We wish to mention a family of constraint posets that is important for applications 
of statistical ranking in data mining, e.g. in recent work of Cheng et al. [8j. For that 
application one would take V to be any disjoint union of a chain and an antichain. 

Example 6.2. Let n > 4 and V be the poset consisting of the 3-chain 1 < 2 < 3 and n — 3 
incomparable elements. If = 4 then C{V) = {1234, 1243, 1423,4123} and the toric ideal 
Jinv is the zero ideal in the polynomial ring in four unknowns. If n = 5 then the number of 
states is 20 and the model polytope has dimension 7, degree 82, and the Hilbert series is 

1 + 12t + 38^2 + 28t^ + 3i^ 

■ 

The Markov basis for this P-constrained model consists of 40 quadrics: 



P41523P51423 ' 
P41235P51243 " 
P41235P51234 " 
P12543P51234 " 
P15234P45123 " 
P15243P41253 ' 
P14523P41235 ' 
P14253P15243 ' 
P12435P15234 " 
P12534P14235 " 



P14523P54123 
P12435P54123 
P12345P54123 
P12534P51243 
■ P41523P51234 
P12543P41523 
P14235P41523 
P12453P15423 
P12345P15243 
P12354P14253 



P41253P51423 ' 
P15423P51243 " 
P15423P51234 " 
P12435P51234 ■ 
P12543P45123 " 
P15234P41253 " 
P14253P41235 " 
P14235P15243 " 
P12543P14523 " 
P12453P14235 " 



P14253P54123 

■ P15243P51423 

■ P15234P51423 

■ P12345P51243 

■ P12453P54123 
P12534P41523 
P14235P41253 

■ P12435P15423 

■ P12453P15423 

■ P12435P14253 



P41235P51423 ' 
P14253P51243 ' 
P15243P51234 " 
P15423P45123 " 
P12534P45123 " 
P14523P41253 " 
P12534P41235 " 
P14235P15234 " 
P12534P14523 " 
P12435P12534 " 



■ P14235P54123 
P12453P51423 
P15234P51243 

■ P14523P54123 
P41253P51234 
P14253P41523 
P12354P41253 
P12345P15423 
P14253P15234 

■ P12345P12543 



P41253P51243 ' 

P14235P51243 

P14235P51234 

P15243P45123 

P12354P45123 

P15234P41235 ■ 

P12453P41235 

P12543P15234 ■ 

P12354P14523 ' 

P12354P12453 



' P12453P54123 

■ P12435P51423 

■ P12345P51423 

■ P41523P51243 

- P12345P54123 

- P12354P41523 

■ P12435P41253 

■ P12534P15243 
' P12345P15423 

- P12345P12543 



It can be asked which P-constrained inversion models have a Markov basis of quadrics 
and, more generally, which degrees appear in a Markov basis. We confirmed the quadratic 
Markov basis for all posets P on n < 4 elements, all on n = 5 elements arising by adding 
one incomparable element to a poset on 4 elements, and all unconstrained models for n < 5. 

Interestingly, the notion of inversion model changes if we define i < j to be an inversion 
if 7c{i) > 77 (j). The latter can be seen as a homogeneous Babington-Smith model from 
The defining monomial map for this alternative inversion model equals 



Ptz ^ 



n n 



Vij 



for TT e C{V). 



1 <z < J <n 
ir(i)<7r{j) 



1 <i < <n 
7r{j)>7r(j) 



For the 3-chain 1 < 2 < 3 with two incomparable elements, the Markov basis now consists of 



P15243P51423 ' 
P12534P51243 " 
P12543P51234 " 
P12543P15234 " 
P12435P12453 - 



■ P12543P54123 

■ P12354P51423 

- P12354P51423 

- P12354P15423 
Pl2345Pl2543i 



P15234P51423 ' 
P15423P51234 ■ 
P12534P51234 ■ 
P12534P15234 



■ P12534P54123 

■ P12534P54123 

- P12345P51423 

- P12345P15423 



P15423P51243 
P15243P51234 ■ 
P12354P51234 
P12354P15234 



■ P12543P54123 

■ P12354P54123 

- P12345P51243 

- P12345P15243 



P15234P51243 ' 
P15234P51234 " 
P12534P15243 " 
P12354P12534 " 



■ P12354P54123 
P12345P54123 

■ P12354P15423 

■ P12345P12543 



and Pl4235Pl4253Pl4523~Pl2345Pl5243Pl5423; ^ud P41235P41253P41523P45123~Pl2345P51243P51423P54123- 



So, unlike in Example 6.2, this Markov basis is not quadratic. The Hilbert series equals 
1 + 9t + 28t2 + 51t3 + 66t^ + 63t5 + 44^*^ + 21t^ + St^ 
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Note that, if C{V) is closed under taking inversions, then this model coincides with the 
normal P-constraint inversion model up to a relabeling. This holds for the unconstrained 
inversion model. All examples tested in this alternative model had normal model polytopes. 

7. Plackett-Luce Model and Bradley-Terry model 
The Plackett-Luce model is a non-toric model on the set C{V) of permutations vr G (5„ 



that are consistent with a given constraint poset V on [n 

1 



It can be defined by the map 



(16) 



I-)- 



n-1 

n 



for TT G C{V). 



=1 E5=i ^'ttO) 

We denote this model by PLp and its homogeneous ideal by /pLp- Thus /pL^ is the kernel 
of the ring map M.[pt^ : vr G C{V)] — ?■ ]R(0i,^^2; • • • idn) defined by the formula (16) The 



formula shows that the Plackett-Luce model is a submodel of the ascending model on C{V). 
In fact, the ascending model is the toric closure of the Plackett-Luce model, by which we 
mean that As-p is the smallest toric model containing PL-p. The specialization map is 

(17) ^7r({l,2,...,i}) ^ + ^t(2) + ■ ■ ■ + (^n{i)) ■ 

We fix K = C and regard the Plackett-Luce model PLp as a projective variety in pl^(^)l~^. 
The toric closure property means that all binomials in JpL^ must lie in Jasc, and this follows 
from unique factorization in M[6'i, . . . ,6n], given that the linear forms in (17) are distinct. 



In order for PL-p to be properly defined as a statistical model, its probabilities should 
sum to 1. For this we would need to identify the normalizing constant, which is the image 
reciv) under the map |(16)[ A formula for this quantity can be derived, for many 



situations of interest, from equations (25) and (26) in Hunter's article ||21j. The most 
general situation where the normalizing constant was determined can be found in |2]. They 



nations 



(16) 



make use of sophisticated methods from the algebraic and geometric theory of val 
on cones. In our situation, Ylween P'^ mapped to q^q^,„q under the ring map in 

Let us begin by examining the unconstrained case when V is an antichain, Q = 2'"] 
and C{V) = M{Q) = This is the Plackett-Luce model PL„ familiar from the statistics 
literature [2T | 129]. With the correct normalizing constant, its parametrization equals 



(18) 



Pn H- 



n 



9. 



7r(j) 



for 71 G &n- 



i=l Y!j=l ^n{j) 

This defines a polynomial map from the non-negative orthant ]R>g to the (ra!— l)-dimensional 
simplex of probability distributions on the symmetric group ©„. We shall regard PL„ as a 
complex projective variety in the ambient P"' ~^. Being the i mage of a rational map from 
P"~^, the dimension of this variety is < n — 1. Theorem 7.4 shows that it equals n — 1. 



Example 7.1 {n = 3). The Plackett-Luce model PL3 is a surface of degree 7 embedded in 5- 
dimensional projective space P^. The parameterization (16) of that surface is equivalent to 

Pus ^ O2O3{Ol+O3){02+03), P132 ^ ^2^3(^1+^2) (^2+^3) , P213 ^ ^^1^^3 (^^l+^^s) (^2+^3) , 
P231 ^ ^1^3(^l+^2)(^l+^^3), P312 ^ ^1^^2(^l+^2) (^2+^3) , P321 ^ ^^1^^2 (^^l+^^2) (^1 +^3) • 
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The defining ideal /pLg of PL3 is minimally generated by three quadratic polynomials, in 
addition to the familiar cubic binomial that specifies the ambient ascending model: 

J ^ /Pl23(P321 +P231) -P213(P132 +P312), P312(pi23 + P213) " Pl32(p231 +P32l),\ 

\ P23l(Pl32 +P312) -P32l(Pl23 +P213), Pl23P23lP312 " Pl32P32lP213 /' 

The singular locus of PL3 consists of the three isolated points 6321 — 6231, 6123 — 6213 and 
ei32 — 6312 in P^. In particular, there are no singular points with non-negative coordinates, 
so this statistical model is a smooth surface in the 5-dimensional probabihty simplex. 

Prom the point of view of algebraic geometry, our parametrization map represents the 
blow-up of the projective plane at the following configuration of nine special points: 

(0:0:1) (0:1:0) (1:0:0) 
(19) (1 : -1 : 0) (1:0: -1) (0 : 1 : -1) 

(1:1:-1) (1:-1:1) (-1:1:1) 

This configuration has three 4-point lines and four 3-point lines. The map blows down the 
three 4-point lines, and this creates a rational surface in P^ with three singular points. 

Prom the point of view of commutative algebra, one might ask whether the four genera- 
tors of the ideal /pLg form a Grobncr basis with respect to some term order. A computation 
reveals that this is not the case. However, we do get a square-free Grobner basis for the 
lexicographic term order with j5i23>Pi32>P2i3>P23i>P3i2>P32i- The initial ideal equals 

iniex(/pL3) = (Pl23,Pl32,P23l) H (]?123, Pl32, P312) H (pi23, Pl32, P213) H 

(Pl23,P213,P23l) n (P123,P213,P312) H (pi23, P312, P32l) H (p231, P312, P32l) ■ 

This represents a simplicial complex of seven triangles, listed in a shelling order, so /pLg is 
Cohen-Macaulay. The Hilbert series of the ring R\p]/Ipi^^ equals {l + 3t + M'^)/{l-tf. □ 

Example 7.2 (n = 4). The Plackett-Luce model PL4 is a threefold of degree 191 in P^^. It 
is obtained from P^ by blowing up 55 lines. The homogeneous prime ideal Ipi^^ that defines 
PL4 is minimally generated by 105 quadrics and 75 cubics. Its Hilbert series equals 

1 + 20t + 105^2 + 65^3 

■ 

We do not know whether /pL„ is generated in degree 2 and 3 for n > 5. □ 

Let us now turn to the general Plackett-Luce model with a given constraint poset V, 
so only permutations tt in C{V) are allowed. The model PL-p is obtained from PL„ by 
projecting onto those coordinates. Algebraically, the prime ideal Ip is obtained from JpL„ 
by eliminating all unknowns p.^ where tt is a permutation that is not compatible with V. 

Example 7.3. Let n = 4 and let V be the poset with two covering relations 1<2 and 3<4. 
The corresponding distributive lattice JC{V) is the product of two chains of length 3. Note 
that C{V) has six maximal chains, namely, the permutations that respect 1 < 2 and 3 < 4. 
The corresponding unknowns are mapped to products of four linear forms as follows: 

P1234 ^ 03(01 + 03)ie3 + Odi^l + 03 + O4), P1324 ^ ^3(^1 + ^2)(^3 + ^4)(^1 + ^3 + ^4), 
P1342 ^ ^3(^1 + ^2)(^3 + 0,){e, + 02 + ^3), P3124 ^ ^l(^l + ^2)(^3 + ^4)(^1 + ^3 + ^4), 
P3142 ^ OliOl + ^2) (^3 + 0^){e, + 02 + ^3), P3412 ^ ^l(^l + ^2)(^1 + + 62 + 9^). 
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These reducible quartics meet in nine lines in P^, so the parametrization of PLp blows 
these up. The ideal I-p is complete intersection. Its minimal generators are the cubic 

I 2 I 2 

Pl234Pl342P3142 + ^1234^3142 + ^1234^3142^3412 " ^1234^1324^3412 " ^1324^3412 — ^1324^3124^3412 

and the binomial quadric P1342P3124— P1324P3142 that defines the ascending model on P. □ 

The following is our main result in this section. It should be useful for obtaining infor- 
mation about the (n— l)-dimensional variety PLp and its homogeneous prime ideal I-p. 

Theorem 7.4. The parameterization P"~i — )■ PLp C pl^(^)l~^ of the Plackett-Luce model 
on the poset V is given geometrically as the blowing up ofV""^^ along an arrangement of 
linear subspaces of codimension 2. These subspaces are defined by the equations ^jg^i^i = 
^jeB ~ where {A, B} runs over all incomparable pairs in the distributive lattice on V . 



Proof. Let ]R[t] denote the polynomial ring of parameters in the ascending model (10), Its 
indeterminates are t^ where A runs over subsets of [n] that are order ideals in V. We define 
M to be the Stanley-Reisner ideal of the distributive lattice of order ideals in V. This is 
the ideal in ]R[p] generated by products txts where A and B are incomparable, meaning 
that neither A G B nor B G A holds. The Alexander dual of M is the monomial ideal 

M* = fl {tA,tB), 
{A,B} 

where the intersection is over all incomparable pairs of order ideals. The generators of M* 
correspond to the associated primes of M, so they are indexed by compatible permutations 
TT G C{V). Interpreting vr as a maximal chain of order ideals, that correspondence is 

(20) P^ ^ ioiTieCiV). 

The arrangement of subspaces described in the statement of [Theorem 7.4 is the intersection 
of the variety of M* with a subspace P"~i defined by = 'Ylii^A^i- substituting this 
into (20) we see that the blow-up along that subspace arrangement is defined by the map 



(21) p^ ^ riE^O = const- J] ^^1^ for7rG£(P). 

A^TT iGA AStt ^«6A * 



This is precisely the defining parametrization (16) of the Plackett-Luce model PLp. □ 
Example 7.5. Let n = 4 and V as in Example 7.3[ Then the above Stanley-Reisner ideal is 

M = {tits, ^3^12, tl2^13; ^1^34, i'^12^345 i^l3^34, ^34^123; ^12^134, i^l23^134)- 



Its Alexander dual reveals the combinatorial pattern of the map in Example 7.3 



M* — ( ^31^13^341^^134) ^31^12^341^^123) ^li^^l2^34i^^l23) ^3'^12^34'5^134) ^li^^l2^34i^^l34) ^li^^l2^13'^123 ) • 

The model PL-p is the blow-up of P^ at nine lines, one for each of the generators of M. □ 
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Each of our unconstrained ranking models was considered as a subvariety of the complex 
projective space P"'~^. If K is any /c-element subset of [n] then we obtain a natural rational 
map P"-'~i — -> p'^'-i which records the probabilities for each of the k\ orderings of K only. 
Statistically, this map corresponds to marginalization for the induced orderings on K. We 
can now take the direct product of all of these maps, where K runs over all (^) subsets of 
cardinality k in [n\. The resulting rational map into a product of projective spaces, 

(22) P"'"^ (P'^'-i)©, 

is called the complete marginalization map of order k. For example, if n = 3 and k = 2 
then we are mapping into a product of three projective lines, with coordinates {qi2 : q2i), 
{Qis '■ 931) and (^23 : (I32) respectively. Here, the complete marginalization is the rational 
map P^ --->■ P^ X P^ X P^ which is given in coordinates as follows: 



(?12 

(gi3 

(?23 



q2l) = (Pl23 + Pl32 + P312 : P213 + P231 + P32l) , 
931) = (Pl32 + P123 + P213 : P312 + P321 + P231) , 
fe) = (Pl23 + P213 + P23I ■ Pl32 + P312 + P32l)- 

We shall refer to the complete marginalization of order 2 as the pairwise marginalization. 

Example 7.6. The pairwise marginalization of the Plackett-Luce surface PL3 C P^ is the 
surface in P^ x P^ x P^ that is defined by the binomial equation gi2q'23?3i = Q'2iQ'32Q'i3- 



The composition of the map in Example 7.1 with the map in (22) is a toric rational map 



p2 pixP^xPi that blows up the three coordinate points (1:0:0), (0:1:0) and (0:0:1). □ 

It is worthwhile, both algebraically and statistically, to study the various marginal- 
izations of the Csiszar model, ascending model, the inversion model and the Plackett- 
Luce model. Of particular interest is the pairwise marginalization of the Plackett-Luce 
model. This is known in the literature as the Bradley- Terry model j21]. All of these 
marginalized models make sense relative to a fixed constraint poset V. Here, we regard 
each k-set K as subposet of V and we write the corresponding marginalization map as 
(23) f>\c{v)\~i p|£(if)|-i_ 

The complete fc-th marginalization is the image of the direct product of these maps, as K 
runs over all /c-sets. For convenience, we shall here remove those fc-sets K that are totally 



ordered in V because the corresponding maps in (23) are constant when |£(K)| = 1. 

We conclude this article with the following algebraic characterization of the Bradley- 
Terry model. We write for the bidirected graph on [n] where («, j) is a directed edge if 
i and j are incomparable in V. Each circuit ii,i2, . . . , ir,H in V'^ is encoded as a binomial: 

(24) 



1 , . -r^ ' " " is 



These binomials define hypersurfaces in Pv2/. For instance, the model in Example 7.6 
the toric hypersurface in P^ x P^ x P^ thus associated to a 3-cycle. 

The theorem below refers to unimodular Lawrence ideals. This class of toric ideals was 
introduced and studied by Bayer et al. in [3]. The associated toric varieties live naturally 
in a product of projective lines P^ x ■ ■ ■ x P^. The case of interest here is that of unimodular 
Lawrence ideals arising from graphs. For these ideals and their syzygies we refer to [3, §5]. 
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Theorem 7.7. The Bradley- Terry model with constraints V is toric. It is defined by the 



unimodular Lawrence ideal whose generators are the circuits (24) in the bidirected graph V 



From this resuh we can now determine the commutative algebra invariants of the 
Bradley- Terry model, such as its Hilbert series in the Z"-grading and its multidegree. 

Proof. Following |21] . the parametrization of the Bradley- Terry model can be written as 

6 ■ 

(25) Qij I—)- _^ hi incomparable in V. 

Let P{ij} be new unknowns indexed by unordered pairs {i,j} C [n]. The unimodular 
Lawrence ideal associated with the bidirected graph is the kernel of the monomial map 

(26) Qij P{i,j} ■ Oj for i,j incomparable in P. 

The specialization P{ij} = (^i + ^j)^^ shows that the ideal /bt-p of the Bradley- Terry model 
is contained the unimodular Lawrence ideal generated by the circuits [(24) [ In addition, the 
ideal Ibt-p contains the linear polynomials qij + qji — 1. These represent the fact that, in 
any compatible ranking vr, either item i ranks before item j or vice versa, but not both. 



Let J be the ideal generated by the circuits (24) and these linear polynomials. We 
have seen that J C Ibt^, and we are claiming that equality holds. But this follows by 
observing that both ideals are prime, and their varieties have the same dimension, namely 
n — 1. Indeed, Ibt-p is prime by definition, and J is prime because adding the linear forms 
Qij + qji — l to the unimodular Lawrence ideal simply amounts to dehomogenizing from to 
in each factor. Geometrically, this operation preserves the dimension of the variety. □ 
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